53 research outputs found
Learning Equivalence Classes of Bayesian Networks Structures
Approaches to learning Bayesian networks from data typically combine a
scoring function with a heuristic search procedure. Given a Bayesian network
structure, many of the scoring functions derived in the literature return a
score for the entire equivalence class to which the structure belongs. When
using such a scoring function, it is appropriate for the heuristic search
algorithm to search over equivalence classes of Bayesian networks as opposed to
individual structures. We present the general formulation of a search space for
which the states of the search correspond to equivalence classes of structures.
Using this space, any one of a number of heuristic search algorithms can easily
be applied. We compare greedy search performance in the proposed search space
to greedy search performance in a search space for which the states correspond
to individual Bayesian network structures.Comment: Appears in Proceedings of the Twelfth Conference on Uncertainty in
Artificial Intelligence (UAI1996
A Transformational Characterization of Equivalent Bayesian Network Structures
We present a simple characterization of equivalent Bayesian network
structures based on local transformations. The significance of the
characterization is twofold. First, we are able to easily prove several new
invariant properties of theoretical interest for equivalent structures. Second,
we use the characterization to derive an efficient algorithm that identifies
all of the compelled edges in a structure. Compelled edge identification is of
particular importance for learning Bayesian network structures from data
because these edges indicate causal relationships when certain assumptions
hold.Comment: Appears in Proceedings of the Eleventh Conference on Uncertainty in
Artificial Intelligence (UAI1995
Fast Learning from Sparse Data
We describe two techniques that significantly improve the running time of
several standard machine-learning algorithms when data is sparse. The first
technique is an algorithm that effeciently extracts one-way and two-way
counts--either real or expected-- from discrete data. Extracting such counts is
a fundamental step in learning algorithms for constructing a variety of models
including decision trees, decision graphs, Bayesian networks, and naive-Bayes
clustering models. The second technique is an algorithm that efficiently
performs the E-step of the EM algorithm (i.e. inference) when applied to a
naive-Bayes clustering model. Using real-world data sets, we demonstrate a
dramatic decrease in running time for algorithms that incorporate these
techniques.Comment: Appears in Proceedings of the Fifteenth Conference on Uncertainty in
Artificial Intelligence (UAI1999
A Decision Theoretic Approach to Targeted Advertising
A simple advertising strategy that can be used to help increase sales of a
product is to mail out special offers to selected potential customers. Because
there is a cost associated with sending each offer, the optimal mailing
strategy depends on both the benefit obtained from a purchase and how the offer
affects the buying behavior of the customers. In this paper, we describe two
methods for partitioning the potential customers into groups, and show how to
perform a simple cost-benefit analysis to decide which, if any, of the groups
should be targeted. In particular, we consider two decision-tree learning
algorithms. The first is an "off the shelf" algorithm used to model the
probability that groups of customers will buy the product. The second is a new
algorithm that is similar to the first, except that for each group, it
explicitly models the probability of purchase under the two mailing scenarios:
(1) the mail is sent to members of that group and (2) the mail is not sent to
members of that group. Using data from a real-world advertising experiment, we
compare the algorithms to each other and to a naive mail-to-all strategy.Comment: Appears in Proceedings of the Sixteenth Conference on Uncertainty in
Artificial Intelligence (UAI2000
Efficient Approximations for the Marginal Likelihood of Incomplete Data Given a Bayesian Network
We discuss Bayesian methods for learning Bayesian networks when data sets are
incomplete. In particular, we examine asymptotic approximations for the
marginal likelihood of incomplete data given a Bayesian network. We consider
the Laplace approximation and the less accurate but more efficient BIC/MDL
approximation. We also consider approximations proposed by Draper (1993) and
Cheeseman and Stutz (1995). These approximations are as efficient as BIC/MDL,
but their accuracy has not been studied in any depth. We compare the accuracy
of these approximations under the assumption that the Laplace approximation is
the most accurate. In experiments using synthetic data generated from discrete
naive-Bayes models having a hidden root node, we find that the CS measure is
the most accurate.Comment: Appears in Proceedings of the Twelfth Conference on Uncertainty in
Artificial Intelligence (UAI1996
Finding Optimal Bayesian Networks
In this paper, we derive optimality results for greedy Bayesian-network
search algorithms that perform single-edge modifications at each step and use
asymptotically consistent scoring criteria. Our results extend those of Meek
(1997) and Chickering (2002), who demonstrate that in the limit of large
datasets, if the generative distribution is perfect with respect to a DAG
defined over the observable variables, such search algorithms will identify
this optimal (i.e. generative) DAG model. We relax their assumption about the
generative distribution, and assume only that this distribution satisfies the
{em composition property} over the observable variables, which is a more
realistic assumption for real domains. Under this assumption, we guarantee that
the search algorithms identify an {em inclusion-optimal} model; that is, a
model that (1) contains the generative distribution and (2) has no sub-model
that contains this distribution. In addition, we show that the composition
property is guaranteed to hold whenever the dependence relationships in the
generative distribution can be characterized by paths between singleton
elements in some generative graphical model (e.g. a DAG, a chain graph, or a
Markov network) even when the generative model includes unobserved variables,
and even when the observed data is subject to selection bias.Comment: Appears in Proceedings of the Eighteenth Conference on Uncertainty in
Artificial Intelligence (UAI2002
Selective Greedy Equivalence Search: Finding Optimal Bayesian Networks Using a Polynomial Number of Score Evaluations
We introduce Selective Greedy Equivalence Search (SGES), a restricted version
of Greedy Equivalence Search (GES). SGES retains the asymptotic correctness of
GES but, unlike GES, has polynomial performance guarantees. In particular, we
show that when data are sampled independently from a distribution that is
perfect with respect to a DAG defined over the observable variables
then, in the limit of large data, SGES will identify 's equivalence
class after a number of score evaluations that is (1) polynomial in the number
of nodes and (2) exponential in various complexity measures including
maximum-number-of-parents, maximum-clique-size, and a new measure called {\em
v-width} that is at least as small as---and potentially much smaller than---the
other two. More generally, we show that for any hereditary and
equivalence-invariant property known to hold in , we retain the
large-sample optimality guarantees of GES even if we ignore any GES deletion
operator during the backward phase that results in a state for which does
not hold in the common-descendants subgraph.Comment: Full version of UAI pape
Large-Sample Learning of Bayesian Networks is NP-Hard
In this paper, we provide new complexity results for algorithms that learn
discrete-variable Bayesian networks from data. Our results apply whenever the
learning algorithm uses a scoring criterion that favors the simplest model able
to represent the generative distribution exactly. Our results therefore hold
whenever the learning algorithm uses a consistent scoring criterion and is
applied to a sufficiently large dataset. We show that identifying high-scoring
structures is hard, even when we are given an independence oracle, an inference
oracle, and/or an information oracle. Our negative results also apply to the
learning of discrete-variable Bayesian networks in which each node has at most
k parents, for all k > 3.Comment: Appears in Proceedings of the Nineteenth Conference on Uncertainty in
Artificial Intelligence (UAI2003
Learning Bayesian Networks: The Combination of Knowledge and Statistical Data
We describe algorithms for learning Bayesian networks from a combination of
user knowledge and statistical data. The algorithms have two components: a
scoring metric and a search procedure. The scoring metric takes a network
structure, statistical data, and a user's prior knowledge, and returns a score
proportional to the posterior probability of the network structure given the
data. The search procedure generates networks for evaluation by the scoring
metric. Our contributions are threefold. First, we identify two important
properties of metrics, which we call event equivalence and parameter
modularity. These properties have been mostly ignored, but when combined,
greatly simplify the encoding of a user's prior knowledge. In particular, a
user can express her knowledge-for the most part-as a single prior Bayesian
network for the domain. Second, we describe local search and annealing
algorithms to be used in conjunction with scoring metrics. In the special case
where each node has at most one parent, we show that heuristic search can be
replaced with a polynomial algorithm to identify the networks with the highest
score. Third, we describe a methodology for evaluating Bayesian-network
learning algorithms. We apply this approach to a comparison of metrics and
search procedures.Comment: Appears in Proceedings of the Tenth Conference on Uncertainty in
Artificial Intelligence (UAI1994
A Bayesian Approach to Learning Bayesian Networks with Local Structure
Recently several researchers have investigated techniques for using data to
learn Bayesian networks containing compact representations for the conditional
probability distributions (CPDs) stored at each node. The majority of this work
has concentrated on using decision-tree representations for the CPDs. In
addition, researchers typically apply non-Bayesian (or asymptotically Bayesian)
scoring functions such as MDL to evaluate the goodness-of-fit of networks to
the data. In this paper we investigate a Bayesian approach to learning Bayesian
networks that contain the more general decision-graph representations of the
CPDs. First, we describe how to evaluate the posterior probability that is, the
Bayesian score of such a network, given a database of observed cases. Second,
we describe various search spaces that can be used, in conjunction with a
scoring function and a search procedure, to identify one or more high-scoring
networks. Finally, we present an experimental evaluation of the search spaces,
using a greedy algorithm and a Bayesian scoring function.Comment: Appears in Proceedings of the Thirteenth Conference on Uncertainty in
Artificial Intelligence (UAI1997
- …